Automatic Corpus-Based Extraction of Chinese Legal Terms

نویسندگان

  • Oi Yee Kwong
  • Benjamin Ka-Yin T'sou
چکیده

This paper reports on a study involving the automatic extraction of Chinese legal terms. We used a word segmented corpus of Chinese court judgments to extract salient legal expressions with standard collocation learning techniques. Our method takes the characteristics of Chinese legal terms into account. The extracted terms were evaluated by human markers and compared against a legal term glossary manually compiled from the same set of data. Results show that at least 50% of the extracted terms are legally salient. Hence they may supplement the outcome and lighten the inconsistency of human efforts. Moreover, various types of significant knowledge in the legal context are mined from the data as a by-product.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

Chinese Sketch Engine and the Extraction of Grammatical Collocations

This paper introduces a new technology for collocation extraction in Chinese. Sketch Engine (Kilgarriff et al., 2004) has proven to be a very effective tool for automatic description of lexical information, including collocation extraction, based on large-scale corpus. The original work of Sketch Engine was based on BNC. We extend Sketch Engine to Chinese based on Gigaword corpus from LDC. We d...

متن کامل

Measuring precision in legal term mining: a corpus-based validation of single and multi- word term recognition methods

Legal terminology presents certain traits which may interfere with its automatic detection such as its relevant presence in everyday language. Thus, this research explores the levels of precision achieved by five single and multi-word term recognition methods on a pilot legal corpus of 2.6 million words. A comparison is carried out with the results presented by Marín (2014a). Once the most effe...

متن کامل

Extraction of Bilingual Technical Terms for Chinese-Japanese Patent Translation

The translation of patents or scientific papers is a key issue that should be helped by the use of statistical machine translation (SMT). In this paper, we propose a method to improve Chinese–Japanese patent SMT by premarking the training corpus with aligned bilingual multi-word terms. We automatically extract multi-word terms from monolingual corpora by combining statistical and linguistic fil...

متن کامل

Constructing an IR-oriented legal ontology

We introduce in this paper a general method to elaborate an information retrieval oriented legal ontology. An ontology is [Gruber] “ [...] an explicit specification of a conceptualization. [...] A conceptualization is defined by the objects concepts and other entities that are presumed to exist in some area of interest and the relationships that hold among them. Since the set of objects and rel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001